Search CORE

44 research outputs found

SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

Author: Amine Mohieddine
Dghaily Tarek
Ghanem Bernard
Giancola Silvio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/04/2018
Field of study

In this paper, we introduce SoccerNet, a benchmark for action spotting in soccer videos. The dataset is composed of 500 complete soccer games from six main European leagues, covering three seasons from 2014 to 2017 and a total duration of 764 hours. A total of 6,637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution). As such, the dataset is easily scalable. These annotations are manually refined to a one second resolution by anchoring them at a single timestamp following well-defined soccer rules. With an average of one event every 6.9 minutes, this dataset focuses on the problem of localizing very sparse events within long videos. We define the task of spotting as finding the anchors of soccer events in a video. Making use of recent developments in the realm of generic action recognition and detection in video, we provide strong baselines for detecting soccer events. We show that our best model for classifying temporal segments of length one minute reaches a mean Average Precision (mAP) of 67.8%. For the spotting task, our baseline reaches an Average-mAP of 49.7% for tolerances

\delta

ranging from 5 to 60 seconds. Our dataset and models are available at https://silviogiancola.github.io/SoccerNet.Comment: CVPR Workshop on Computer Vision in Sports 201

arXiv.org e-Print Archive

Crossref

Integration of Absolute Orientation Measurements in the KinectFusion Reconstruction pipeline

Author: Ghanem Bernard S.
Giancola Silvio
Schneider Jens
Wonka Peter
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/02/2018
Field of study

In this paper, we show how absolute orientation measurements provided by low-cost but high-fidelity IMU sensors can be integrated into the KinectFusion pipeline. We show that integration improves both runtime, robustness and quality of the 3D reconstruction. In particular, we use this orientation data to seed and regularize the ICP registration technique. We also present a technique to filter the pairs of 3D matched points based on the distribution of their distances. This filter is implemented efficiently on the GPU. Estimating the distribution of the distances helps control the number of iterations necessary for the convergence of the ICP algorithm. Finally, we show experimental results that highlight improvements in robustness, a speed-up of almost 12%, and a gain in tracking quality of 53% for the ATE metric on the Freiburg benchmark.Comment: CVPR Workshop on Visual Odometry and Computer Vision Applications Based on Location Clues 201

arXiv.org e-Print Archive

Crossref

Scipedia

A metrological characterization of the Kinect V2 time-of-flight camera

Author: Corti Andrea
Giancola Silvio
Mainetti Giacomo
Sala Remo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

A metrological characterization process for time-of-flight (TOF) cameras is proposed in this paper and applied to the Microsoft Kinect V2. Based on the Guide to the Expression of Uncertainty in Measurement (GUM), the uncertainty of a three-dimensional (3D) scene reconstruction is analysed. In particular, the random and the systematic components of the uncertainty are evaluated for the single sensor pixel and for the complete depth camera. The manufacturer declares an uncertainty in the measurement of the central pixel of the sensor of about few millimetres (Kinect for Windows Features, 2015), which is considerably better than the first version of the Microsoft Kinect (Chow et al., 2012 [1]). This work points out that performances are highly influenced by measuring conditions and environmental parameters of the scene; actually the 3D point reconstruction uncertainty can vary from 1.5 to tens of millimetres

Archivio istituzionale della ricerca - Politecnico di Milano

MVTN: Learning Multi-View Transformations for 3D Understanding

Author: AlZahrani Faisal
Ghanem Bernard
Giancola Silvio
Hamdi Abdullah
Publication venue
Publication date: 27/12/2022
Field of study

Multi-view projection techniques have shown themselves to be highly effective in achieving top-performing results in the recognition of 3D shapes. These methods involve learning how to combine information from multiple view-points. However, the camera view-points from which these views are obtained are often fixed for all shapes. To overcome the static nature of current multi-view techniques, we propose learning these view-points. Specifically, we introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition. As a result, MVTN can be trained end-to-end with any multi-view network for 3D shape classification. We integrate MVTN into a novel adaptive multi-view pipeline that is capable of rendering both 3D meshes and point clouds. Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks (ModelNet40, ScanObjectNN, ShapeNet Core55). Further analysis indicates that our approach exhibits improved robustness to occlusion compared to other methods. We also investigate additional aspects of MVTN, such as 2D pretraining and its use for segmentation. To support further research in this area, we have released MVTorch, a PyTorch library for 3D understanding and generation using multi-view projections.Comment: under review journal extension for the ICCV 2021 paper arXiv:2011.1324

arXiv.org e-Print Archive

Learning Semantic Segmentation with Query Points Supervision on Aerial Images

Author: Ghanem Bernard
Giancola Silvio
Hinojosa Carlos
Rivier Santiago
Publication venue
Publication date: 11/09/2023
Field of study

Semantic segmentation is crucial in remote sensing, where high-resolution satellite images are segmented into meaningful regions. Recent advancements in deep learning have significantly improved satellite image segmentation. However, most of these methods are typically trained in fully supervised settings that require high-quality pixel-level annotations, which are expensive and time-consuming to obtain. In this work, we present a weakly supervised learning algorithm to train semantic segmentation algorithms that only rely on query point annotations instead of full mask labels. Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation. Specifically, we generate superpixels and extend the query point labels into those superpixels that group similar meaningful semantics. Then, we train semantic segmentation models, supervised with images partially labeled with the superpixels pseudo-labels. We benchmark our weakly supervised training approach on an aerial image dataset and different semantic segmentation architectures, showing that we can reach competitive performance compared to fully supervised training while reducing the annotation effort.Comment: Paper presented at the LXCV workshop at ICCV 202

arXiv.org e-Print Archive